AITopics | Tiaret Province

Collaborating Authors

Tiaret Province

A Novel Double Pruning method for Imbalanced Data using Information Entropy and Roulette Wheel Selection for Breast Cancer Diagnosis

Bacha, Soufiane, Ning, Huansheng, Mostefa, Belarbi, Sarwatt, Doreen Sebastian, Dhelim, Sahraoui

arXiv.org Artificial IntelligenceMar-15-2025

Accurate illness diagnosis is vital for effective treatment and patient safety. Machine learning models are widely used for cancer diagnosis based on historical medical data. However, data imbalance remains a major challenge, leading to hindering classifier performance and reliability. The SMOTEBoost method addresses this issue by generating synthetic data to balance the dataset, but it may overlook crucial overlapping regions near the decision boundary and can produce noisy samples. This paper proposes RE-SMOTEBoost, an enhanced version of SMOTEBoost, designed to overcome these limitations. Firstly, RE-SMOTEBoost focuses on generating synthetic samples in overlapping regions to better capture the decision boundary using roulette wheel selection. Secondly, it incorporates a filtering mechanism based on information entropy to reduce noise, and borderline cases and improve the quality of generated data. Thirdly, we introduce a double regularization penalty to control the synthetic samples proximity to the decision boundary and avoid class overlap. These enhancements enable higher-quality oversampling of the minority class, resulting in a more balanced and effective training dataset. The proposed method outperforms existing state-of-the-art techniques when evaluated on imbalanced datasets. Compared to the top-performing sampling algorithms, RE-SMOTEBoost demonstrates a notable improvement of 3.22\% in accuracy and a variance reduction of 88.8\%. These results indicate that the proposed model offers a solid solution for medical settings, effectively overcoming data scarcity and severe imbalance caused by limited samples, data collection difficulties, and privacy constraints.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2503.12239

Country:

Europe > Portugal > Coimbra > Coimbra (0.05)
Asia > China > Beijing > Beijing (0.04)
Africa > Middle East > Algeria > Tiaret Province > Tiaret (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.52)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Fr\'echet regression for multi-label feature selection with implicit regularization

Mansouri, Dou El Kefel, Benkabou, Seif-Eddine, Benabdeslem, Khalid

arXiv.org Machine LearningDec-24-2024

Fréchet regression, an extension of classical linear regression to general metric spaces, offers a robust framework for modeling complex relationships between variables when the responses lie outside of Euclidean spaces. This approach is especially well suited to high-dimensional datasets, such as vector representations, with particular relevance to fields like imaging, where capturing nonlinear dependencies and the intrinsic data structure is critical for accurate modeling (Fréchet (1948), Petersen and Müller (2019), Bhattacharjee and Müller (2023), Qiu, Yu and Zhu (2024)). A significant consideration in Fréchet regression arises when predicting multiple responses simultaneously, as seen in multi-target or multidimensional problems (Zhang and Zhou (2007), Hyvönen, Jääsaari and Roos (2024)). Unlike traditional regression, where each observation corresponds to a single response, Fréchet regression can be extended to model complex interactions between multiple outputs. This ability to address complex relationships between several responses opens new avenues, particularly in fields such as bioinformatics (Huang et al. (2005)) and image analysis (Lathuilière et al. (2019)), where multidimensional data and interdependencies between responses require adaptive and specialized methodologies. However, to date, the handling of multilabel scenarios within the context of Fréchet regression remains relatively unexplored in the literature, despite its potential significance in addressing complex, multidimensional applications. In this paper, we present an extension of the Global Fréchet regression model, a specific variant of Fréchet regression that generalizes classical multiple linear regression by modeling responses as random objects. This extension enables the explicit modeling of relationships between input variables and multiple responses, thereby addressing the multi-label setting. Our second contribution in this paper addresses the dimensionality challenge in the context of the proposed Fréchet regression extension.

artificial intelligence, machine learning, regression, (17 more...)

arXiv.org Machine Learning

2412.18247

Country:

Europe > France (0.04)
Africa > Middle East > Algeria > Tiaret Province > Tiaret (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.75)

Add feedback

KEDformer:Knowledge Extraction Seasonal Trend Decomposition for Long-term Sequence Prediction

Qin, Zhenkai, Wei, Baozhong, Gao, Caifeng, Ni, Jianyuan

arXiv.org Machine LearningDec-6-2024

Time series forecasting is a critical task in domains such as energy, finance, and meteorology, where accurate long-term predictions are essential. While Transformer-based models have shown promise in capturing temporal dependencies, their application to extended sequences is limited by computational inefficiencies and limited generalization. In this study, we propose KEDformer, a knowledge extraction-driven framework that integrates seasonal-trend decomposition to address these challenges. KEDformer leverages knowledge extraction methods that focus on the most informative weights within the self-attention mechanism to reduce computational overhead. Additionally, the proposed KEDformer framework decouples time series into seasonal and trend components. This decomposition enhances the model's ability to capture both short-term fluctuations and long-term patterns. Extensive experiments on five public datasets from energy, transportation, and weather domains demonstrate the effectiveness and competitiveness of KEDformer, providing an efficient solution for long-term time series forecasting.

data mining, forecasting, machine learning, (17 more...)

arXiv.org Machine Learning

2412.05421

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
North America > United States > California (0.04)
Europe > Greece (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Industry: Energy (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Intelligent Video Recording Optimization using Activity Detection for Surveillance Systems

Elmir, Youssef, Touati, Hayet, Melizou, Ouassila

arXiv.org Artificial IntelligenceNov-4-2024

Surveillance systems often struggle with managing vast amounts of footage, much of which is irrelevant, leading to inefficient storage and challenges in event retrieval. This paper addresses these issues by proposing an optimized video recording solution focused on activity detection. The proposed approach utilizes a hybrid method that combines motion detection via frame subtraction with object detection using YOLOv9. This strategy specifically targets the recording of scenes involving human or car activity, thereby reducing unnecessary footage and optimizing storage usage. The developed model demonstrates superior performance, achieving precision metrics of 0.855 for car detection and 0.884 for person detection, and reducing the storage requirements by two-thirds compared to traditional surveillance systems that rely solely on motion detection. This significant reduction in storage highlights the effectiveness of the proposed approach in enhancing surveillance system efficiency. Nonetheless, some limitations persist, particularly the occurrence of false positives and false negatives in adverse weather conditions, such as strong winds.

artificial intelligence, detection, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2411.02632

Country:

Africa > Middle East > Algeria > Béjaïa Province > Béjaïa (0.05)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
Africa > Middle East > Algeria > Tiaret Province > Tiaret (0.04)
Africa > Middle East > Algeria > El Oued Province > El Oued (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Commercial Services & Supplies > Security & Alarm Services (0.93)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Flickr Africa: Examining Geo-Diversity in Large-Scale, Human-Centric Visual Data

Naggita, Keziah, LaChance, Julienne, Xiang, Alice

arXiv.org Artificial IntelligenceAug-16-2023

Biases in large-scale image datasets are known to influence the performance of computer vision models as a function of geographic context. To investigate the limitations of standard Internet data collection methods in low- and middle-income countries, we analyze human-centric image geo-diversity on a massive scale using geotagged Flickr images associated with each nation in Africa. We report the quantity and content of available data with comparisons to population-matched nations in Europe as well as the distribution of data according to fine-grained intra-national wealth estimates. Temporal analyses are performed at two-year intervals to expose emerging data trends. Furthermore, we present findings for an ``othering'' phenomenon as evidenced by a substantial number of images from Africa being taken by non-local photographers. The results of our study suggest that further work is required to capture image data representative of African people and their environments and, ultimately, to improve the applicability of computer vision models in a global context.

artificial intelligence, geotagged image, social media, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3600211.3604659

2308.08656

Country:

Asia > Brunei (0.14)
North America > Canada > Quebec > Montreal (0.06)
Africa > Sierra Leone (0.06)
(142 more...)

Genre: Research Report > Experimental Study (0.66)

Industry:

Health & Medicine (0.92)
Information Technology > Services (0.75)
Government > Regional Government (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback